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The Drosscl-Schwabl Forest Fire Model is one of the best studied models of non-conservative self- 
organised criticality. However, using a new algorithm, which allows us to study the model on large 
statistical and spatial scales, it has been shown to lack simple scaling. We thereby show that the 
considered model is not critical. This paper presents the algorithm and its parallel implementation 
in detail, together with large scale numerical results for several observables. The algorithm can 
easily be adapted to related problems such as percolation. 
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1. INTRODUCTION 

The assumption that SOC (Jensen, 1998) is the correct framework to describe and explain the ubiquity of power 
laws in nature, has been greatly supported by the development of non-conservative models, because natural processes 
are typically dissipative. Contrary to these models, analytical work has suggested, that the deterministic part of the 
dynamics must be conservative in order to obtain scale invariancc (Grinstein et al., 1990; Hwa and Kardar, 1989). 
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However, on a mean- field level, this is not necessarily true (Vespignani and Zapperi, 1998), which has been exemplified 
in an exact solution of a model, that has a forest fire- like driving (Pruessner and Jensen, 2002b). However, as a random 
neighbour model, the latter lacks spatial extension. 

The Drossel-Schwabl Forest Fire Model (DS-FFM) (Drossel and Schwabl, 1992) is one of the few spatially extended, 
dissipative models, which supposedly exhibit SOC. Contrary to the Olami-Feder-Christensen stick-slip model (Olami 
et ai, 1992), where criticality is still disputed (for recent results see for example (Boulter and Miller, 2003; Lise and 
Paczuski, 2001a, b)), for the DS-FFM the asymptotic divergence of several moments of its statistics, and therefore the 
divergence of an upper cutoff can be shown rigorously. Although this might be considered as a sign of criticality, it is 
far from being a sufficient proof. In equilibrium thermodynamics "criticality" usually refers to a divergent correlation 
length (Binney et at, 1998; Stanley, 1971) in the two-point correlation function, which is associated with a scale- 
invariant or power-law like behaviour. This is how the term "criticality" is to be interpreted in SOC: Observables 
need to be scale invariant^, i.e. power laws in the statistics. There are many examples of divergent moments without 
scale invariance, such as the over critical branching process (Harris, 1963) or over critical percolation (StaufFer and 
Aharony, 1994). 

Thus, there is a priori no reason to assume that the DS-FFM is scale free. However, there are many numerical 
studies, which suggest so (Christensen et ai, 1993; Clar et ai, 1994; Drossel and Schwabl, 1992), one of them, however, 
suggests the breakdown of simple scaling (Grassberger, 1993). Since an analytical approach is still lacking, numerical 
methods are required to investigate this problem. In this paper, we propose a new, very fast algorithm to simulate the 
DS-FFM with large statistics and on large scales. The implementation of the algorithm, has produced data of very 
high statistical quality. Some of the results have been already published elsewhere (Pruessner and Jensen, 2002a). 

The structure of the paper is as follows: The next section contains the definition of the model together with its 
standard observables and their relations. Then the algorithm is explained in detail. The section finishes with a 
detailed discussion on the changes necessary to run the algorithm on parallel or distributed machines. In the third 
section results for the two dimensional FFM are presented and analysed. The paper concludes with a summary in 
the fourth section. 



2. METHOD AND MODEL 

This section is mainly technical: After defining the model, all relevant details of the implementation are discussed. 
Apart from concepts such as the change from a tree oriented algorithm to a cluster oriented algorithm, concrete 
technical details are given, for example memory requirements and methods for handling histograms. The section also 
contains a description of the performance analysis of the implementation. A parallelised version of the algorithm is 
introduced an discussed in the last section. 



2.1. The Model 

A Forest Fire Model was first proposed by Bak, Chen and Tang (Bak et ai, 1990) and changed later by Drossel 
and Schwabl (Drossel and Schwabl, 1992) to what is now known as the Forest Fire Model (or DS-FFM as we call it): 
On a d dimensional lattice of linear length L, each site has a variable associated with it, which indicates the state of 
the site. This can either be "occupied" (by a tree), "burning" (occupied by a fire) or "empty" (ash). In each time 
step, all sites are updated in parallel according to the following rules: If a site is occupied and at least one of its 
neighbours is burning, it becomes burning in the next time step. If a site is occupied and none of its neighbours is 
burning, it becomes burning with probability /. If a site is empty, it becomes occupied with probability p. If a site is 
burning it becomes empty in the next time-step with probability one. As these probabilities become very small, they 
are better described as rates in a Poisson like process. From a simple analysis it is immediately clear (Clar et at, 
1994), that the model can become critical only in the limit p ^ and / ^ 0. In this limit, the burning process 
becomes instantaneous compared to all other processes (see also sec. 2.2.2) and can be represented by the algorithm 
shown in Fig. 1. 

Compared to the instantaneous burning, both of the remaining processes are slow. In section 2.2.2 it is shown that 
p ^ f is required (Clar et ai, 1994) for criticality, so that f/p < 1 and the algorithm in Fig. 1 can be written as 
Fig. 2, which is faster than the former, because the number of random choices of a site is reduced, but equivalent 
otherwise. 



In a finite system the distributions are not expected to be free of any scale, but to be dominated asymptotically by one scale only. 
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FOREVER { 

/* Choose a site randomly */ 
rn = random site; 

/* If empty occupy with probability p */ 
IF (rn empty) THEN { 

with probability p: rn=occupied; 
} ELSE { 

/* If occupied start a fire with probability f */ 
with probability f : 

burn entire cluster connected to rn; 

} 

} 

FIG. 1 The naive, basic algorithm of the DS-FFM 
FOREVER { 

/* The following line is without effect */ 
with probability p: { 

rn = randomly chosen site; 
IF (rn empty) THEN { 

rn=occupied; 
} ELSE { 

with probability f/p: 

burn entire cluster connected to rn; 

} 

} 

} 

FIG. 2 A faster algorithm, doing essentially the same as the one shown in Fig. 1. 

The line with probability p makes sure that the occupation attempt still happens with probability p and the 
burning attempt still occurs with pf /p = /. Of course, the line is completely meaningless, because the alternative, 
which occurs with probability 1 — p is no action at all. It therefore can be omitted. Then every randomly picked 
empty site will become occupied, while burning happens with the reduced probability f/p. 

This rescaling of probabilities is only possible in this form if the two processes are independent, which is the case 
because a new occupation can only occur for empty sites, while a burning attempt operates only on occupied sites. If 
both processes were to operate on the same type of site, a reduced probability (1 + f /p)^^ would decide between the 
two alternatives. 

The implementation shown in Fig. 2 (without the meaningless line) has been used for example in (Henley, 1993; 
Honecker and Peschel, 1997). However, probably for historical reasons, the model is usually (Clar et ai, 1994; 
Grassberger, 1993; Schenk et al., 2000) implemented as shown in Fig. 3, where trees are grown in chunks of p/ f 
between two lightning attempts. Although this means that sites become re-occupied only in chunks oi p/f, it turns 
out that apart from peaks in the histogram of the time series of global densities of occupied sites (Schenk et ai, 2000), 
the statistics do not depend on these details. Only in order to avoid any confusion, all data for this article have been 
produced by means of the algorithm in Fig. 3. Moreover this algorithm is much more suitable for parallelisation (see 
section 2.5). 

2.2. Statistical Quantities 

The objects of interest in the DS-FFM are clusters formed by occupied sites: Two trees belong to the same cluster, 
if there exists a path between them along nearest neighbouring, occupied sites. The cluster in the DS-FFM correspond 
to avalanches in sandpile-like models (Jensen, 1998). The cluster, which is burnt at each burning step can be examined 
more closely, so that various geometrical properties can be determined either as averages (and higher moments) or 
as entire distribution: Mass (in the following this term is used synonymously to size), diameter, time to burn it etc. 
The last property is better expressed as the maximum length for all paths parallel to the axes and fully within the 
given cluster, connecting the initially burnt tree and each tree within the same cluster. It is the maximum number of 
nearest neighbour moves one has to make to reach all sites in the same cluster, in this sense a "Manhattan distance" 
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FOREVER { 

/* This is just a loop to occupy the 

* right number of sites */ 
REPEAT p/f TIMES { 

rn = randomly chosen site; 

IF (rn empty) THEN {rn=occupied; } 

} 

rn = randomly chosen site; 
IF (rn occupied) THEN { 

burn entire cluster connected to rn; 

} 

} 



FIG. 3 The traditional implementation. 



(Cormcn et at, 1990a). As trees catch fire due to nearest neighbours only, this maximum distance is the total burning 
time of the entire cluster. In the definition above, the "time to burn" Tm becomes a purely geometrical property of 
the cluster and therefore independent from the actual implementation (see sec. 2.3.4) of the burning procedure. 



2.2.1. Cluster size distribution 

The most prominent property of the model, however, is the size distribution of the clusters, n(s), which is the 
single-site normalised number density of clusters of mass s, i.e. the number of clusters of size s per unit volume. The 
average cluster size, i.e. the average size of a cluster a randomly chosen occupied site belongs to, is correspondingly 
defined as 

s = ^-^ . (1) 

As indicated by the bar, n(s) denotes the expected distribution, i.e. something to be estimated from the observables. 
On average, the probability that a randomly chosen site belongs to a cluster of size s is then sn(s). If nj(s) denotes 
the cluster size distribution of the configuration at time t (see below), then one expects 

{ntis))^nis) . (2) 

where () denotes the ensemble average (as opposed to~, which denotes the average over sn(s; 9)). Assuming ergodicity, 
one has 



iL-^E^*-(^) (3) 



t=l 



for an arbitrary quantity At measured at each step t of the simulation. The limit exists for all bound observables At . 

Regarding the time t, it is worth noting that a step in the simulation is considered completed, i.e. t ^ t + 1 if the 
randomly chosen site for the lightning attempt was occupied, i.e. the attempt was successful, so that T is the number 
of burnt clusters. For sufficiently large systems, the changes of the system due to growing or lightning are almost 
negligible, and so are the differences between averages taken over all lightning attempts or all successful lightning 
attempts. Also, the distributions found directly before and directly after burning tend to the same expectation value 
for sufficiently large systems, see sec. 3.1.1. It is noted only for completeness, that in this paper the cluster size 
distribution nt{s) has been measured directly after the burning procedure. Therefore nt{s) does not include the 
cluster burnt at time step t, just like n(+i(s) does not in an implementation, where the distribution is measured before 
burning. 

Introducing 



Ms) (4) 



as average density of occupied sites, the expected distribution of burnt clusters is sn{s)/p. To see this, V^(s) is 
introduced, denoting the distribution of clusters burnt in the tth step of the simulation. This distribution contains 
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only one non-zero value for each t, namely V^{s) = 1 for the size s of the cluster burnt at time t, and 'Pj'(s) = for 
all other s. Therefore 

N 

s=l 

where N is the number of sites in the system, N = L'^, which is also the maximum mass of a cluster. Since the site 
where the fire starts is picked randomly, the cluster burnt in time step t + 1 is drawn randomly from the distribution 
nt{s) with a probability proportional to the mass of the cluster. The normalisation of the distribution sn{s) is given 
by (4), so that for t large enough, the effect of the initial condition can be neglected, 

{rns))^snis)/p. (6) 

In the stationary state the average number of trees, p is related to s^by (Clar et al., 1994) 

^ 1-p 



Op 



(7) 



This equation, as well as (6), is strictly only exact if the density of occupied sites is constant over the course of the 
growing phase. For very large system sizes (7) holds almost perfectly, as shown in Tab. Ill; however, note the remarks 
in Sec. 3.1.1. 

For a coherent picture (s) is introduced, which is the histogram of all clusters, i.e. J^s '^ti^) tli*^ number of 
clusters in the system at time t. According to the definition of n{s) it is 

(Vtis)) = Nn{s) , (8) 

and correspondingly 

s 

with (pt) = p. Since (6) and (8) differ on the RHS only by constants rather than by random variables, both distribution, 
V^is) and Vf{s), are estimators of the expected distribution n{s). Clearly, the burnt cluster distribution 'P^{s) is 
much sparser than than 'Pf{s) and the estimator for n{s) derived from this quantity, is therefore expected to have a 
significant larger standard deviation. On the other hand, its autocorrelation time is expected to be considerably smaller 
than that of (s), because on average only p/ f + 1 entries [pp/ f sites are occupied in each "growing loop" , which 
is repeated on average 1/p times) of the latter are changed between two subsequent measurements, corresponding to 
the number of newly occupied sites plus the cluster which is burnt down. So, "Pf (s) provides a much larger sample 
size, but is also expected to be much more correlated. In order to judge, whether it is wise to spend CPU time on 
calculating the full Vf[s) rather than only 'Pi{s), as it was done in the past (Clar et al., 1994), these competing 
effects need to be considered, by calculating the estimate for the standard deviation of the estimator of n(s) from 
both observables, which is discussed in detail in section 2.4. 

2.2.2. Timescales 

In order to obtain critical behaviour in the FFM, a double separation of time scales is required (Clar et al., 1996) 

/ « P « (^0 , (10) 

with some positive exponent v' . The left relation, / <C p, entails f /p ^ and therefore (10) entails p ^ and / ^ 0. 
This is also the case for 

/<P<1 , (11) 

and therefore leads to the same prescription to drive the system, however (10) entails (11) but not vice versa. This 
can be seen by noting that (10) entails the non-trivial relation p^+^/'^ <^ f <^ p. Some authors, however, just state 
(11) (Grassberger, 1993; Vespignani and Zapped, 1998). The three scales involved are due to three different processes 
and their corresponding rates: 
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1. The timescale on which the burning happens, the typical time of which is handwavingly estimated as the 
average number of sites in a burnt cluster, J oc p/ f. A more appropriate assumption is that the typical burning 
time scales like a power of the average cluster size (Clar et ai, 1996). This should be distinguished from the 
scaling of the average time it takes to burn a cluster, because the typical time represents the chracteristic scale 
of the burning time distribution, which might be very different from its average. 

2. The timescale of the growing, which is 1/p. 

3. The timescale of the lightning, 1//. 

Burning must be fast compared to growing, so that clusters are burnt down, before new trees grow on it edges (Clar 
et ai, 1996), i.e. (p/ f)" <C l/_p or {.f /pY » p- In order to obtain divergent cluster sizes, growing must be much faster 
than lightning, i.e. p^ f. Thus, the double separation reads as stated in (10). By making the burning instantaneous 
compared to all other processes, the dynamics effectively loses one timescale. In this case, the rates / and p, measured 
on this microscopic timescale, vanish, i.e. f ~ and p = 0, so that the right relation of (10) is perfectly met, provided 
that p/ f does not vanish. However, the ratio f /p remains finite, and / ^ p is still to be fulfilled. A finite f /p means 
that one rate provides a scale for the other. Measuring the rates on the macroscopic timescale, defined by the sequence 
of burning attempts, / becomes 1 in these new unities, and p becomes p/ f = 0~^. The notation = f /p corresponds 
to (Vespignani and Zapperi, 1998), which is, unfortunately, the inverse of 6* used in (Grassbergcr, 1993). Eq. (10) then 
means ^ 0. At first sight, this result seems paradoxical, since = is incompatible with instantenous burning's 
compliance with p 0'^ . However, this problem does not appear in the limit 0—^0. In a finite system, one cannot 
make 6 arbitrarily small, as the system will asymptotically oscillate between the two states of being completely filled 
and completely empty. On the other hand, for fixed and sufficiently large system sizes, a further increase in system 
size will leave the main observables, such as pt and (see Section 2.2.1), essentially unchanged. These asymptotic 
values, namely the observables at a given 9 in the thermodynamic limit, are to be measured. 

2.2.3. Scaling of the cluster size distribution 

Assuming that finite size effects do not play any role, i.e. for 6 not too small, the ansatz 

n{s;0) = s-^g{s/so{0)) (12) 

as obtained in percolation (Stauffer and Aharony, 1994) is reasonable for s larger than a fixed lower cutoff. In the 
following, the additional parameter in n{s;9) is omitted, whenever possible. The quantity So{9) is the upper cutoff 
and supposed to incorporate all 9 dependence of the distribution. It can be shown easily (Clar et ai, 1994) that the 
second moment of n(s]0) (see (1)) diverges in the limit 0—^0 and L —^ oo, so that sq must diverge with — > 0. 
Here, g{x) plays the role of a cutoff function, so that liuix^ao Qi^) = and for large x falls off faster than any power, 
because all moments of n(s; 9) are finite in a finite system. For finite x, g{x) can show any structure and does not have 
to be constant. However, assuming liuis„^oonis;0) finite, Gis/sg) can be regarded as constant in s for sufficiently 
large sq, so that n{s\0) behaves like a power law, s~^, for certain s. However, a priori it is completely unknown, 
whether sg is large enough in that sense and the only way to determine r directly from n(s; 0) is via a data collapse. 
It is already known that "simple scaling" (12) docs not apply in the presence of finite size effects (Schenk et ai, 2000). 

The assumption (12) states that the FFM is scale-free in the limit sq(0) — > oo and defines the exponent r which 
characterises the scale invariance. One cannot stress enough, that with the breakdown of (12), the proposed exponent is 
undefined, unless a new scaling behaviour is proposed. It has been pointed out that (12) certainly contains corrections 
(Pastor-Satorras and Vespignani, 2000). This asymptotic character of the universal scaling function is well known 
(Wegncr, 1972) from equilibrium critical phenomena. 

While Grassberger concludes that the ansatz (12) "cannot be correct" (Grassbergcr, 1993), this is rejected in 
(Schenk et ai, 2000). However, the latter authors do not actually investigate G{x) and simply plot their estimate of 
sn{s;0) vs. s/so{0). In the result section it is shown that there is no reason to believe that (12) could hold in any 
finite system. 

2.2.4. Other distributions 

The exponent r as defined in (12) can be related to exponents of other assumed power laws. To this end, the 
distribution P(s, Tm', 9) is introduced, which is the joint probability density function (PDF), for a cluster burnt to be 
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of mass s and burning time (see sec. 2.2) Tm at given 6. Then it is possible to define conditional expectation values 
as (Christensen et ai, 1991). 

E(s|Tm;0) = ^s'P(s',Tm;0) (13) 
EiTM\s;0) = ^Tm'P(s,Tm';0) . (14) 

Moreover it is clear that ??(s; 6) is just a marginal distribution, i.e. 

sn(s;^)-^P(s,TM';0) = Ps(s;^) . (15) 

Tm' 

In the assumed absence of any scale, it is reasonable to define for the distribution of Tm similar to (12) 

Ptm (Tm; 0) = Tm-'Gtm {Tu/TMom (16) 
and for the relation between E(s|Tm) and Tm: 

E(s|rM) cx Tm^' (17) 

To avoid confusion, it is important to keep in mind that the absence of scales is not a physical or mathematical 
necessity: The system could as well "self-organise" to any other, sufficiently broad distribution, which could have 
an intrinsic, finite scale, i.e. a natural constant characterising the features of the distribution. This looks much 
less surprising considering the fact that standard models of critical phenomena (Stanley, 1971) like the Ising model, 
possess such a scale everywhere apart from the critical point. 

An additional assumption is necessary in order to produce a scaling relation: 

PT^iTM;e)dTM - P,(E(s|TM);0)d(E(s|rM;0)) (18) 

where Ptm ^^-'^ denote the marginal distributions of P(s,Tm;0), which leads — assuming sufficiently large sq and 
Tmo — to 

6 = l + Ai'(T-2) (19) 

using Ps = sn{s;9) and (12). Eq. (18) is based on the idea that a cluster requiring burning time Tm is as likely to 
occur as a cluster of the size corresponding to the average taken conditional to the burning time Tm- If the distribution 
P{s,Tm;0) is very narrow, such that E(s|rM) is virtually the only value of s with non-vanishing probability^, this 
condition is met. However, the distribution can have any shape and still obey the assumption, as illustrated in Fig. 4. 

Scaling relation (19) can only be derived via (18), which cannot be mathematically correct, as Ps is actually only 
defined for integer arguments, while in general E(s|Tm) is not integer valued. However, the scaling relation might 
hold in some limit. 

The exponent defining the divergence of sq in (12) is defined as 

soiO) = 0-^ (20) 
leading together with (1) and (7) to the scaling relation (Clar et ai, 1996) 

A(3-r) = l . (21) 

The corresponding exponent for Tmq in (16) as 

Tmo(^) = 0-"' (22) 
The assumption Tmq — E(rM|so) oc Sq^^ then gives the scaling relation 

= A (23) 



^ The extreme case would be P{s,Tm;9) = S{s — /{T]vj))(;{Tm) with a monotonic function /(Tm) representing the conditional average. 
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E(s|Tm) 

s 



FIG. 4 A schematic joint PDF P(s, Tm'; S). The gray shading is used to indicate the density and the straight hnes indicate 
roughly the hmits of the distribution. While a narrower distribution would most easily obey (18), it does not necessarily have 
to be sharply peaked. In this example the weighted areas of the horizontal and the vertical stripes might be the same. They 
cross at the conditional averages. 

It is interesting to note that this assumption is consistent with the assumption that clusters, which have a size of 
the order so{9) need of the order Tmq time to burn. In that case one has Ptm (2\io; = Ps{so;9)ds and as 

Tmo oc Sq one has using (16) and (12): 

(l-fe)y=2-r (24) 

corresponding to (19) with (23). 
2.3. The Implementation 

In this section the new implementation of the DS-FFM is discussed. An implementation especially capable to 
handle large scales has been proposed by Honecker (Honecker, 1997) earlier. The most prominent feature of it is the 
bitwise encoding of the model, which significantly reduces memory requirements. Some of the properties investigated, 
profit from this scheme of bitwise encoding, because bitwise logical operators can be used to determine for example 
correlations, and operate on entire words "in parallel" . However, in this implementation it would have been inefficient 
to count all clusters, i.e. n{s) is determined via V^{s) rather than V^{s). 

In contrast to standard implementations (Clar et ai, 1994; Honecker and Peschel, 1997; Schcnk et ai, 2000), where 
n(s) is derived from V^{s), the philosophy of the implementation presented in this article is to count all clusters 
efficiently by keeping track of their growing and disappearance, so that n{s) is derived from T"^{s). By comparing 
the standard deviation of the estimates, and the costs (CPU time), the efficiency is found to be at least one order of 
magnitude higher. At the same time, the complexity of the algorithm is essentially unchanged, namely 0{9~^ log(iV)) 
instead of 0(6'^^), while a naive implementation of the counting of all clusters is typically of order 0{N). In the 
following the algorithm is described in detail. Because of its close relation to standard percolation, the algorithm 
presented below is also applicable for this classical problem of statistical mechanics. In fact, the percolation algorithm 
recently proposed by Newman and Ziff (Newman and Ziff, 2000, 2001) is very similar. Based on many principles 
presented in this paper, an asynchronously parallelised version for percolation has been developed recently (Moloney 
and Pruessner, 2003). 

2.3.1. Tracking clusters 

Usually each site is represented by a two-state variable, which indicates whether the site is occupied or empty. The 
variable does not need to indicate the state "burning" , because the burning procedure is instantaneous compared to 
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all other processes and can be implemented without introducing a third state (see sec. 2.3.4). In order to keep track 
of the cluster distribution, each site gets associated two further variables (in an actual implementation the number 
of variables can be reduced to one, see sec. 2.3.2), one which points (depending on the programming language either 
directly as an address or as an index) to its "representative" and one which contains the mass of the cluster the 
given site is connected to. The representative of a site is another site of the same cluster, but not necessarily and in 
fact typically not a nearest neighbour. This is shown in Fig. 5. If a site is empty, the pointer to a representative is 
meaningless. The pointer of representatives form a tree-like structure, because representatives might point to another 
representative, as shown in Fig. 6. A site which points to itself and is therefore its own representative, is called a 
"root" site, since it forms the root of the tree like structure. Only at a root site, the second variable, denoting the mass 
of the cluster, is actually meaningful and indicates the mass of the entire cluster. Each cluster is therefore uniquely 
identified by its root site: Any two sites, which belong to the same cluster have the same root and vice versa. By 
construction of the clusters (shown below), it takes less than ©(log TV) to find the root of any site in the system. 



FIG. 5 All occupied sites (black) on the lattice point to a representative. The site pointing to itself is the root of the cluster. 
The site shown in light gray is the one which is about to become occupied, as shown in Fig. 7. The labels on the sites are just 
to uniquely identify them in other figures. 



FIG. 6 The tree-like structure of the largest cluster in Fig. 5. 

The algorithm is a dynamically updated form of the Hoshen-Kopelman algorithm (Hoshen and Kopelman, 1976). 
The same technique has recently been used to simulate percolation efficiently for many different occupations densities 
(Newman and Ziff, 2000). The method described in the following differs from (Newman and Ziff, 2000), by not 
only growing clusters, but also removing them. While one of the main advantages of the original Hoshen-Kopelman 
algorithm is its strong reduction of memory requirements to 0{L'^~^), the algorithm described here only makes use 
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of the data representation proposed by Hoshen and Kopelman, so that the memory requirements are still 0{L'^). 

In the following the technique, how to create and to update the clusters, is described in detail. 

Starting from an empty lattice, the first site becomes occupied by setting the state variable. Since this site cannot 
be member of a larger cluster, its representative is the site itself. Therefore the mass variable must be set to one. The 
same pattern applies to all other sites which get occupied, as long as they are isolated. The procedure becomes more 
involved, when a site induces a merging of clusters. This is the case whenever one or more neighbours of the newly 
occupied site are already occupied. In general the procedure is then as follows: 

• Find the root of all neighbouring clusters. 

• Reject all roots, which appear more than once in order to avoid double counting. 

• Identify the largest neighbouring cluster. 

• Increase the mass variable of the root of this cluster by the mass of all remaining clusters (ignoring those which 
have been rejected above) plus one (for the newly occupied site). 

• Bend the representative pointers of the roots of all remaining clusters to point to the root of the largest cluster 
(keeps the tree height small, sec below). 

• Bend the representative pointers of the newly occupied site to point to the root of the largest cluster. 



FIG. 7 The configuration in Fig. 5 after occupying the highlighted site. Sites, the pointer of which have been changed, are 
shown in dark gray (site 6, 7 and 9). 

This procedure is depicted in Fig. 7, illustrating the join of the clusters shown in Fig. 5. As an optimisation, one 
could also bend the pointer of site 6 to point to site 3, which would effectively be a form of path compression. However, 
as shown below, the trees generated have only logarithmic height, so that the path compression possibly costs more 
CPU time than it saves for system sizes reachable with current computers^. It is important to note that only the root 
of the largest cluster is not redirected. 





Similarly for other forms of path compression, for example bending the pointer of the proceeding to the adjacent site in f ind_root 
(Fig. 8). 
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/* Find the root of the cluster identified by start_index. 

* All sites are expected to have a pointer to their 

* representative in the array pointer_of . The result 

* is stored in index. */ 
index = start_index 

WHILE ( index ! = pointer_of [index] ) { 
index=pointer_of [index] ; } 



FIG. 8 The f ind_root algorithm. All sites are expected to have a pointer to their representative in the array pointer_of . The 
result of this procedure is index. 

To find the root of a given site, which is necessary, whenever clusters are considered for merging, an algorithm like 
the one shown in Fig. 8 needs 0{hm{M{C))) time (worst case), where hm{M{C)) is the maximum height of a cluster 
containing M{C) sites, C being the cluster under consideration. 

All clusters arc constructed by merging clusters, which might often involve single sites. These clusters are represented 
as trees, like the one shown in Fig. 6. In the following this representation is used. By construction, if at least two 
trees join, the resulting tree has either the height of the tree representing the largest cluster or the height of any of 
the smaller trees plus one — whatever is larger. Thus, by construction, 



h, 



,(M) > h^{M') for any M > M' , (25) 

so in order to find the maximum height of a tree of mass M, one has to consider the worst case when the smaller 
trees have maximum height. For a given, fixed M, this is the case when only two cluster merge, so 

/im(M)<maxf max {hm{M - M')), max {\ + h.,n{M'))] , (26) 

where [^J denotes the integer part of M/2 > 0, which is is the maximum size of the smaller cluster. The outer max 
picks the maximum of the two max running over all allowed sizes of the smaller cluster. Using (25), 

M 

Ka{M) < max{h^{M~l),l + h^i[—\)) (27) 

so that 

f l + /i™(LfJ) for l + /i™(Lf J) >/i™(M-l) 
hm{M) < I (28) 
I hm[^I — 1) otherwise 

With /i„i(l) = 1 one can see immediately that 

hm{M) < riog2(M)l (29) 

by induction, nothing that [log2(Af/2)] = [log2(A/)] — 1, where \a] = [a\ + 1 for any a > 0. Hence 

h^{M) e 0{log{M)) , (30) 

which is therefore the (worst case) complexity of the algorithm. It is worthwhile noting that all the algorithms 
considered are just one solution of the more general union-find (and also insert) problem (Gormen et ai, 1990b). 

As the tree constructed is directed, there is no simple way to find all sites which are pointing to a given site. This 
means that splitting trees is extremely expensive in terms of complexity. However, in the DS-FFM trees do not get 
removed individually, but always as complete clusters. Thus, no part of the tree structure needs to be updated during 
the burning (s. section 2.3.4). 



2.3.2. Reducing memory requirements 

The three variables (state, pointer, size) mentioned above would require a huge amount of memory: At least a bit 
for the state (but for convenience a byte) , a word for the address and a word for the mass (actually depending on the 
maximum size of the clusters). However, as the pointers are only meaningful if the site is occupied, the representative 
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pointer can also be used to indicate the state of a site: If it is (or NULL if it is an address), the site is empty and 
occupied otherwise. 

As mentioned above (section 2.3.1), the mass variable is meaningful only at a root site. Since only a certain range 
of pointers is meaningful, the remaining range can be used to indicate the mass of a cluster. Assuming that indeces 
can only be positive, negative numbers as the value of the pointer can be interpreted as self references and their 
modulus as total mass of the cluster. The concept is restricted to system sizes which are small enough that the 
space not occupied by meaningful pointers is large enough to store the mass information. How large is the maximum 
representable system size (not to be confused with memory requirements, which is N times word size)? For a word 
size oi b — 4 byte, i.e. M = 2^^ representable values in a word, the maximum system size is = 2^^ — 1, namely 
_1 . . . _ jV values for indicating masses, 1 ... A" for indeces and for the empty site, summing up to 2A^ + 1 < M, 
which is overruled by the memory required bN < M , as M is (usually) the maximal addressable memory for a single 
process. 





♦ M • 














* bN * 



FIG. 9 The memory layout when using addresses as pointers to representative. The hatched area is used for valid addresses, 
what remains left can be used to represent cluster masses, i.e. if the value of an address points into the white area, the value 
is interpreted as a mass. 

When using addresses as pointers, it is less obvious how to identify the range of meaningless pointers which could 
be used to store the mass information. In order to distinguish quickly whether a given value is an address or a 
mass, the most obvious way is to use higher bits in the pointers. What is the range of meaningless addresses? The 
addresses are words, occupying bN bytes. If each byte is individually addressable (as usual), their value differs by 
at b, i.e. they span a range of bN different values. As shown in Fig. 9, the largest remaining continuous chunk of 
values, not used for references to representatives, has therefore at least size \{M — bN)/2] = (M — bN)/2, assuming 
that the pointer values used, which is also the range of addresses where they are stored, spans a continuous range. If 
the A + 1 different cluster masses are to be represented as pointer values pointing into the meaningless region, one 
has 1 + A^ < (M — bN)/2, i.e. (6 + 2) A" + 2 < M. If they do not have to be continuous, the condition is relaxed: 
1 + A < M — bN. Alternatively one can make use of the lower bits: If the pointers point to words in a continuous 
chunk of memory or at least are all aligned in the same way, then all pointers are identical (mod 5), i.e. all pointers 
p obey p = c (mod b) where < c < 6 is a constant. Since 6 > 1 one can use p ^ c (mod b) to indicate that a given 
pointer value is to be interpreted as mass, which can easily be calculated via a bit-shift. 

In C it is reasonable to represent the sites as void * and interprete these as pointers to other sites, i.e. void **, 
so that the loop to search for a root just becomes the code shown in Fig. 10. 

Representing each site by a word instead of a byte or even a bit (Honecker, 1997), still leads to reasonably small 
memory requirements for typical system sizes (for instance a system of size A^ = 4096 x 4096 would require 64 AIB) . 
Since the algorithm has an almost random memory access pattern, it is not reasonable to implement it out of core 
(Dowd and Severance, 1998). In order to simulate even larger sizes, the following representation has been implemented: 
At the beginning of the simulation the entire lattice is splitted in cells so that whatever site in such a cell is occupied, 
it must belong to the same cluster as any other occupied site in the same cell, i.e. each site in the cell is nearest 
neighbor of all other sites in the cell. On an hyper-cubic lattice these cells have size 2, as depicted in Fig. 11: Each site 
within such a cell must belong to the same cluster if it is occupied. Therefore only one pointer is necessary to refer to 
its representative. On a triangular lattice these cells would have size 3. Since a pointer can be non-null, although not 
all sites in the cell are occupied, a new variable must represent the state of the sites in each cell, if not lower or higher 
order bits of the pointers can be used (see above). On the hyper-cubic lattice the memory requirement is therefore 
for each pair of sites 2 bit for the state and 1 word for the address or index of the representative. Storing the 2 bits 
in a byte (and keeping the remaining 6 bits unused), the memory requirements are therefore reduced to (b + l)N/2 
bytes. Using indices the maximum representable system size is given by 3/2N -f- 1 < -^^ and using pointers with a 
size identification as shown in Fig. 9 the constraint is 1 + A^ < {M — ^)/2 in worst case. 
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void *start_pointer , *root, *content; 

/* start_pointer is the address of the site, the root of which 

* is to be found, root will always point to the site currently 

* under consideration, while content is always the address 

* root is pointing to. 

* The macro IS_SIZE verifies , whether the value given is a size. */ 



/* Initialise: Assume that start_pointer is the root and 

* read its content. */ 

for (content=* ( (void **) (root=start_pointer) ) ; 

/* Test whether root's content is actually a size. */ 
(!IS_SIZE(content)) ; 

/* Iterate: content is not a size, so the next candidate 

* is what root is currently pointing to. 

* Content is updated accordingly. */ 

coiitent=* ( (void **) (root=content) ) ) ; 



FIG. 10 An implementation of f ind_root in C using pointers to void. 
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FIG. 11 If occupied, each site within a dashed box belongs to the same cluster. On a triangular lattice the dashed patches 
would be triangular, each one containing three sites. The thick dashed line shows the orientation of the boundary between two 
consecutive slices in the parallelised code, see Sec. 2.5. 



2.3.3. Efficient histogram superposition 

So far, only the maintenance of the cluster structure has been described. Since the masses of all clusters involved 
are known, it is simple to maintain a histogram of the cluster mass distribution: If a cluster of size s is burnt, the 
corresponding entry in Vf[s) is decreased by one. If a cluster changes size, T^f (s) is updated accordingly. For example, 
when two clusters of size si and S2 merge as a particular site is newly occupied during the growing procedure, Vf{si) 
and Vf{s2) arc decreased by one and T^f (si + S2 + 1) is increased by one. 

Naively, the average cluster size distribution is the average of T^f (s), i.e. 

t'=i 

with T as the number of iterations. Depending on the resolution of the histogram, it would be very time consuming 
to calculate this sum for each s. Using exponential binning (which is in fact a form of hashing) in order to reduce the 
size of the histogram solves the problem only partly. 



14 



Ignoring any hashing, a naive superposition, where each slot in the histogram needs to be read, has complexity 
0{TH), where H is the largest cluster mass in the histogram. 

This problem is solved by noting that early changes in the histogram propagate though the entire sequence of 
histograms. Denoting the initial histogram as Vgis) and AV^is) = V^_i{s) — 'Pf{s) then 

t 

Vt{s)^V^{s) + Y.^'^t'{s) (32) 

t'=i 

and therefore 

T T 

Y,Vt{s)=TV^{s) + Y.^T-t' + l)AVt,{s) . (33) 

t=i t'=i 

By using this identity only the right hand side of (33) is maintained by increasing it by T — t + 1 when a new cluster 
is created at time t and by decreasing it by the same amount when it is destroyed. In this way, the complexity is only 
of order 0{T{9~^ + 1)), according to the number of clusters created and destroyed, i.e. the number of changes in the 
distribution. This concept becomes only problematic, if floating point numbers are used to store the histogram and 
the accuracy is so small that changes in the sum by 1 do not change the result anymore. The maximum value in 
(s), where this docs not happen, is given by the largest m with m+l ^ m where m is a variable of the same type as 
Vfi^s). For floating point number, the value of m is related to the constant DBL_EPSILON (or FLT_EPSILON for single 
precision) , which essentially characterises the length of the mantissa. The concrete value of m is actually platform, 
precision and type dependent. For an imsigned integer of size 4, this value would be {2^^ — 1) — 1, corresponding to 

ULONG_MAX- 1, for double precision IEEE75 floating point numbers this value is FLT_RADIX**DBL_MANT_DIG- 1, i.e. 

253 „ I 

Provided that the right hand side of (33) is below the threshold m discussed above for all s, this means that only 
a single histogram needs to be maintained. It is initialised with T'P^{s) and updated with ±(T — t + 1) at time step 
t, when a cluster of size s appears or disappears. It is worth mentioning that this concept obviously even works in 
conjunction with binning (or any other hashing). 

2.3.4. Implementation of the burning procedure 

The burning procedure was implemented in the obvious way, without making use of the tree structure, as shown in 
Fig. 12. Although the burning procedure could also be implemented explicitly recursively, it is of course significantly 
faster when implemented iteratively. The usage of a stack in the procedure might be thought of as reminiscent of the 
underlying recursive structure. The number of times the outer loop in Fig. 12 runs, defines the generation of the fire 
front and gives Tm; other properties of the burnt cluster can be extracted accordingly. The most important resource 
required by this procedure are the stacks: One for the currently burning sites and one for the sites to be burnt in the 
next step. There is no upper limit known for the number of simultaneously burning sites, apart from the naive N/2 
on a hyper-cubic lattice, which comes from the observation that sites, which belong to the same generation of the fire, 
must reside on the same sub- lattice (even or odd). 

On the other hand it is also trivial to find the maximal number of sites which burn at the same time, if the fire 
starts in a completely dense forest, i.e. in a lattice with p = 1. Obviously the size of the tih. generation is then given 
by A(t — 1) for t > 1 and 1 at the beginning, t = 1. Since the sum of these numbers is the number of sites which 
is reachable within a certain time t, the sum is also an upper limit for the number of simultaneously burning sites. 
Indeed, the actual number can easily be larger than 4(t — 1), caused by arrangements of wholes in the lattice, which 
delay the fire spreading at certain sites, so that they burn together with a larger fire front. Such a construction is 
shown in Fig. 13. 

Of course it is neither reasonable, nor practically possible to provide enough memory for the theoretical worst case, 
i.e. two stacks each of size N/2. Indeed the typical memory requirements seem to be of order 0{\/ 6^^), as shown 
in Table I, where /^^^ denotes the largest fire front observed during the simulation. Providing stacks only of size 4L 
turned out to be a failsafe, yet pragmatic solution. Formally one could implement a slow out-of-core algorithm in the 
rare yet possible case the memory for the stack is insufficient, i.e. use hard-disk space to maintain it. In fact, this is 
what de facto happens if one uses a stack of size N /2 on a virtual memory system. 



* For integers the precision is not a problem, but the maximum rcpresentable number easily becomes a problem. 
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/* Initialise current_stack. */ 
CLEAR current_stack; 

/* Put initial site on current_stack. */ 
PUT rn ON current_stack; 

/* Sites are cleared right after they have entered the current_stack. */ 
rn = empty; 

/* The first loop runs until there is nothing left to 

* burn, i.e. next_stack was not filled during the inner loop. */ 
DO { 

/* Clear next_stack so that it can get filled in the next loop. */ 
CLEAR next_stack; 

/* The next loop runs as long as there are sites left to burn 

* in the current generation of the fire. */ 
WHILE current_stack not empty { 

/* GET: remove the upmost element from current_stack and 

* put it in X */ 
GET X FROM current_stack; 
/* Visit all neighbours */ 
FDR all neighbours n of x { 
if (n occupied) { 

/* Put occupied sites on the current_stack of the next 

* generation of the fire */ 
PUT n ON next_stack; 
n = empty 

} 

} 

} 

/* The next current_stack to be considered is next_stack. */ 

current_stack = next_stack; 
} WHILE current_stack is not empty 

FIG. 12 The burning procedure starting at rn. In an actual implementation the copying of next_stack to current_stack 
can easily be omitted by repeating the code above with current_stack and next_stack interchanged, similar to a red-black 
approach (Dowd and Severance, 1998). 



2.3.5. Complexity of the algorithm 

The overall complexity of the algorithm has two contributions: The "growing" part, where new clusters are generated 
from existing ones and the "burning" part. The time needed for the burning part is proportional to the number of 
sites burnt and therefore expected as 0{s) (see (1) and (7)) and 0{N) in the worst case. Since p in (7) is bound, 
the complexity of "burning" is 0{9~^) (expected). The complexity of "growing" is estimated by the average number 
of sites newly occupied, 9~^, times the worst-case complexity (30) to find the root of any given site, because up to 
four roots need to be found at each tree growing. According to (30) the worst case complexity to find the root of 
any given site is 0{log{N)), leading to an overall complexity for "growing" of C'(log(iV)6'"^) D 0{9~^). In practice 
the logarithmic correction is negligible, especially since log(A^) is an extreme overestimate of the average case and 
therefore essentially the same runtime-behaviour is expected for both procedures (Newman and Ziff, 2001). 

Implementations like the one in (Honecker, 1997) avoid this logarithmic factor by counting only the burnt cluster 
and therefore arrive at an overall complexity of 0{9~^). 

The algorithm presented has therefore only a negligibly higher computational complexity compared to implemen- 
tations which measure only V^. This is corroborated by the comparison of the CPU time per burnt cluster during 
equilibration, i.e. the transient, when the cluster structure do not need to be maintained and the algorithm used is 
the standard implementation, to the CPU time per burnt cluster during statistics, i.e. when obscrvables arc actually 
measured and especially T"^ is produced. This ratio is shown as C in Tab. I and Tab. II. It varies only slightly with 
L or 9-\ 

Apparently the algorithm presented offers more statistics, however it suffers from one limitation: It requires about 
{b + l)N/2 bytes memory (see section 2.3.2), compared to bytes in bitwise implementations like (Honecker, 
1997), i.e. typically a factor 20 more. In order to ascertain whether this disadvantage is acceptable with respect to 
the statistical gain, one has to determine the standard deviations of the calculated quantities for both implementations. 
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FIG. 13 The burning order for a 6 x 6 patch of sites, where seven sites are not occupied and form a barrier, such that some 
sites behind it burn later, together with the fire front propagating away from the starting point of the fire at the lower left 
hand corner. The sites belonging to the largest set of trees burning at the same time are shown in light gray, unoccupied sites 
are shown in white, occupied sites in black. The numbers indicate the generation of the fire, which is one plus the Manhattan 
distance from the starting point of the fire along occupied sites. 

2.4. Calculating the standard deviation 

In order to compare the two algorithm rigorously, it is necessary to estimate the standard deviation of the estimators 
for n{s) produced by them (Landau and Binder, 2000; Miillcr-Krumbhaar and Binder, 1973): 



(34) 



Here r-pb and rpa are the correlation times of the two quantities. Calculating the correlation time in the standard 
fashion by recording the history 'Pf{s) and 'P^{s) for each s would mean to store millions of floating point numbers. 
Therefore it was decided to restrict these calculations to just a small yet representative set of s values. The result 
shows that the standard deviation does not fluctuate strongly in s. 

Because of the special form of V^is) E 0, 1, its variance is particularly simple, 



(V^isf) = (Vns)) (35) 



so that 



<^U'^)=^-^f^{V':{s)){l-{vns))) . (36) 

The correlation time of V^{s) is expected to be extremely small, not only on physical grounds — an cluster can 
only burn down once — but also because of the extreme dilution of V^^is), as it was described in section 2.2.1. For 
fixed s, most of the P^{s) are 0. In contrast, the 'Pf{s) arc expected to have a large correlation time, because "only" 
9^^ + 1 entries are changed between two subsequent histograms. 

The correlation function is calculated in the symmetric way as proposed in (Anderson, 1964), here for an arbritrary 
quantity At'. 



{A1)t - {At)l 
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TABLE I Performance data for different parameters and setups. "ap3000,2" denotes a parallel run on two nodes on an AP3000, 
accordingly "ap3000,4". "cluster, 10" denotes a cluster of 25 Intel machines, connected via an old 10 MBit network, "cluster, 100" 
denotes the same cluster on a 100 MBit network, "singlel" and "single2" denote two different types of single nodes. The largest 
fire front, /max, was only measured on these systems. The quantity ^ is the ratio of the average time (real time on the parallel 
systems in order to include communication overhead, user time on single nodes) for one successful update during statistics, i.e. 
when all data structures need to be maintained, and equilibration (transient) i.e. when the standard representation is used. 



System 


L 


6 


C 


,/„ax 


ap3000,2 


8000 


4000 


1.51 




ap3000,2 


8000 


8000 


1.52 




ap3000,4 


16000 


4000 


1.34 




ap3000,4 


16000 


8000 


1.48 




ap3000,4 


16000 


16000 


1.37 




ap3000,4 


16000 


32000 


1.41 




cluster, 10 


32000 


4000 


2.71 




cluster, 10 


32000 


64000 


3.81 




cluster, 100 


32000 


32000 


1.76 




singlel 


1000 


500 


1.41 


216 


singlel 


2000 


1000 


1.41 


326 


singlel 


4000 


125 


1.42 


106 


singlel 


4000 


250 


1.47 


172 


singlel 


4000 


500 


1.48 


255 


singlel 


4000 


1000 


1.53 


317 


singlel 


4000 


2000 


1.50 


518 


singlel 


4000 


4000 


1.57 


646 


singlel 


4000 


8000 


1.48 


907 


singlel 


4000 


16000 


1.45 


1327 


single2 


8000 


4000 


2.11 


687 


single2 


8000 


8000 


2.11 


912 


single2 


8000 


16000 


2.09 


1415 



where Ox-t' denotes the average taken over time t from t = Itot — T~t' . The quantity cf)^^ was fitted to exp(— t/r^i) 
in order to find the correlation time ta- The results are given in Table II. 

As described in (6) and (8), the two estimators for n{s) differ slightly. However, except for n(s) only constant values 
appear on the RHS of (6) and (8), so that the relative errors of {T't{s))T and {Vt{s))T are also the relative errors of 
the estimators for n{s) derived from them. These relative errors are shown in Tab. II as well. Their ratio is given as 
a and is an indicator for the advantage of the algorithm proposed. If the relative error is to be improved by a factor 
q, one needs to invest CPU-time, i.e. if the algorithm proposed in this paper costs a factor ( more CPU-time, and 
the gain in the relative error a, the total gain is a^/C- The values for this quantity are also given in table II. 

According to the table, for fixed 6 relative errors and the correlation times are only weakly affected by an increase 
in system size. At first sight, this is counter-intuitive, as the number of passes (Henley, 1993; Honecker and Peschel, 
1997), the mean number of times a site has been visited between two lightnings, decreases inversely proportional to the 
total number of sites in the system: l/{9pL^), see Section 3.2. Assuming that this number is essentially responsible 
for the error, suggests to keep the number of passes constant among different L. However, this is apparently not the 
case, possibly because of self-averaging (Ferrenberg et ai, 1991) effects. 

The table also shows various tendencies, which arc worth mentioning. First of all, the total gain becomes smaller for 
larger avalanche size s. The B in front of some of the values indicates that a bin around the s value was investigated, 
i.e. the time series of 

E ^"'(^') (38) 
s'eB 

was considered, where S is a set of (consecutive) s values, representing the bin. For larger values of s, these sets 
get exponentially larger, which is necessary for a reasonably large number of events as basis for the estimators. The 
general tendency that the proposed algorithm is even more efficient at small s is not surprising: samples from 
sn{s), while samples only from n{s), i.e. "sees" larger cluster more often. Nevertheless V"" still is advantageous 
by roughly a factor 5. The empty entries in Tab. II are due to numerical inaccuracies or simply missing simulations 
for certain parameters. Some entries are estimated and marked as such. 

There is an additional correlation not mentioned so far: The individual points in the estimator of the distribution 
are not independent. There are "horizontal correlations", i.e. ■p'^(s) is correlated for different values of s. These 
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TABLE II Correlation times Tb and Ta of the corresponding observables and as a function of s and for different parameters 
L, 9~^. Values of s marked by "B" are results for bins around the s value indicated. For each set of parameters, the quantity 
^ is given. It denotes the ratio between the average CPU-time for one successful update during equilibration (transient) and 

during statistics, see also Tab. I. The two fractions -^^^^^^-yy-, ^(-p^^s))^ ■ their ratio a and a^/C are derived. * marks cases, 
wheres Tt{s) = has been assumed. ^ marks values of ra{s), which have been extrapolated from ra{s) for smaller s. 



L 


9 


C 


s 


n{s) 


ra{s) 


(•pf(s)) 




a 


2 /A 

aye 


4000 


4000 


1.57 


10 


- 


- 


0.0138* 


- 


- 


- 








100 


0.170 


23.6 


0.0637 


0.00099 


64.3 


2633.4 








B 10=* 


0.028 


14.2 


0.0450 


0.00191 


23.6 


354.8 








B 10* 


0.006 


10.0 


0.0412 


0.00470 


8.8 


49.3 








B 10'^ 




7.2 


0.0662* 


0.02104 


3.1 


6.1 


4000 


16000 


1.45 


10 


0.013 


39.9 


0.0141 


0.00056 


25.4 


444.9 








100 


0.126 


28.8 


0.0608 


0.00127 


48.0 


1589.0 








B lO''^ 


0.006 


4.7 


0.0457 


0.00175 


26.1 


469.0 








B 10* 


0.013 


2.9 


0.0512 


0.00332 


15.4 


163.6 








B 10'^ 




2.2 


0.0433 


0.00795 


5.4 


20.1 


8000 


1000 


- 


10 


0.131 


- 


0.0154 


- 


- 


- 








100 


0.122 


284.6 


0.0602 


0.00158 


38.1 


- 








B lO''^ 


0.028 


236.5 


0.0399 


0.00337 


11.8 


- 








B 10* 


0.016 


163.5 


0.0397 


0.00878 


4.5 


- 


8000 


4000 


2.11 


10 


0.122 


78.2 


0.0154 


0.00052 


29.8 


420.9 








100 


0.132 


16.4 


0.0634 


0.00087 


72.9 


2518.7 








B 10^ 


0.022 


8.2 


0.0438 


0.00147 


29.7 


418.1 








B 10* 


0.005 


5.5 


0.0442 


0.00241 


18.3 


158.7 








B 10^ 




4.2 


0.0409* 


0.01006 


4.1 
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are additional correlations due to clusters of small sizes, which are likely to grow and propagate through s in Vf{s) 
for consecutive time steps, i.e. 

{Vt{s)Vt,{s'))~{Vt{s)){Vt,{s')) . (39) 

This correlation is at least partly captured by the correlations measured for the binned data. It is to be distinguished 
from the correlations of independent realisations, where correlations are expected in the cluster size distribution also, 
i.e. 

{Vt{s)Vt{s'))-{Vt{s)){Vt{s')) . (40) 

This must be taken into account as soon as estimates of n{s) for different s are compared, as it is done when an 
exponent is calculated by fitting. This effect is also present for V^, which is, however, diluted so enormously that it 
influences the outcome only in an insignificant way. 

The horizontal correlations could be estimated using a Jackknife scheme (Efron, 1982), similar to that used to 
calculate the error bar of the exponent from the time evolution of a quenched Ising model (Pruessner et ai, 2001). 
While it is certainly essential for the careful estimation of the error bar of an exponent, it is irrelevant for the discussion 
in this paper, as it is quantitatively based only on local comparisons of error bars (overlaps), while its global properties, 
i.e. shape and collapse with other histograms estimated, is not concerned with errors bars. Some authors even seem 
to dismiss the relevance of these correlations completely (Newman and Ziff, 2001). 
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2.5. Parallelising the code 

Constructing clusters and keeping track of clusters rather than of single sites seems to be in contradiction to any 
attempt to run the algorithm distributed, that is splitting the lattice into S slices (one-dimensional decomposition 
— as periodic boundaries apply, the slices may better be called cylinders). Moreover, there is a general problem of 
parallelisation which becomes apparent in this context: The usual bottleneck of parallel systems is the communication 
layer. In order to keep the communication between sub-lattices as low as possible, fast parallel code on a lattice requires 
as few interaction between slices as possible, while the whole point of doing physics on large lattices is the assumption 
of significant interaction between their parts. It is this fundamental competition of requirement and basic assumption 
which makes successful parallel code so rare and which seems to indicate that problems must have very specific 
characteristics in order to be parallelisable in a reasonable way. 

However, it is indeed possible to run the algorithm described above on parallel machines successfully in the sense 
that it does not only make use of the larger amount of (distributed) memory available, but also of the larger amount of 
computing capabilities. In fact, the code was successfully rewritten using MPI (Gropp et ai, 1999) and has been run 
on two systems with distributed memory: The massively parallel machine AP3000 at the Department of Computing 
at Imperial College and on a cluster of workstations (25 nodes). 

In the following the most important design characteristics are described which proved important in order to make 
the code running reasonably fast. This concerns mainly the statistics part, but the equilibration also needs some 
tricks. 

MPI assures that packets sent from one node to another in a certain order are received in exactly the same order — 
in the language of MPI this means that the message ordering is preserved in each particular communicator. But, how 
different communicators relate to each other, i.e. how one stream of packets relates to another one is not specified. If, 
for instance, node A sends a packet to node B, and then to node C, which then sends a packet to node B, this packet 
might arrive earlier at B then the packet first sent by A, see Fig. 14. 




FIG. 14 Nodes A, B and C send messages in the order indicated. However, it might well happen that the message sent last 
by node C to node B, namely message 3, arrives at that node before message 1, sent before message 2 was sent, which arrived 
before message 3 was sent. 

However, it is one of the main goals of parallelisation, to avoid any kind of synchronisation, which is extremely 
expensive. Even in a master-slave design, as it was chosen here, one encourages communication between the slaves, 
whenever they can anticipate what to do next or can indicate each other what to do next. 

As explained above (2.1) an update consists essentially of two steps: Growing and burning. Both processes now are 
distributed among the slices. The growing procedure is realized by trying to grow 9^^ / S trees in each slice. This is 
not an exact representation of a growing procedure taking place on the entire lattice at once, because the latter has 
a non- vanishing probability to grow all trees at one particular spot, while the parallelised version distributes them 
evenly among the different slices. Provided that 6*"^ is large compared to S, this effect can certainly be neglected. 
The advantage of the procedure is that the growing procedure at each slice does not need to be conducted by the 
master. The burning procedure is more complex, as the fire starts at one particular site of the entire lattice, so that it 
must be selected by the master. The exact procedure of the possibly following burning process depends on the stage 
of the algorithm. 

In the following the procedures are explained in terms of "sites" rather than "cells", as introduced in section 2.3.2. 
Using cells instead of sites makes the code slightly more complicated, but the changes are obvious. If the cells are 
oriented parallel to the borders of slices (see Fig. 11), so that its width is a multiple of 2 in case of a hyper-cubic 
lattice, the algorithm runs considerably faster, as the communication between the nodes is reduced by the same factor. 
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2.5.1. Equilibration 

During the equilibration phase it is not necessary to keep track of all clusters. Nevertheless there is some statistics, 
which is very cheap to gather: The distribution of burnt clusters and the density of trees. The latter is very simple, 
as this number changes in time only by the number of grown trees minus the number of burnt trees. This is also 
a crosscheck for the overall statistics, as the tree density is equivalent to the probability of a site to belong to any 
cluster (4). 

The burning is implemented as follows: The master chooses a site from the entire lattice and sends the corresponding 
slice (slave) the coordinate and (implicitly) an identifier which uniquely identifies this request within this update step. 
The slice's response consists of the number of sites burnt (possibly 0), the identifier referring to the initial request 
and possibly up to two further, new, unique identifiers. These identifiers refer to the two possible sub-requests to the 
right and left neighbouring slice due to a spreading of the fire. If a slice contacts another slice, it docs so by sending 
the coordinates of sites, which are on fire in the sending patch, together with a unique identifier. The slice contacted 
sends it result to the master, again together with the identifier and possibly two new ones, corresponding to the 
possibly two contacted neighbouring slices. In this way the master keeps track of "open (sub)requests" , i.e. requests 
the master has been told about by receiving an answer containing information about sub-requests, which have not 
been matched by receiving a corresponding answer. The structure of requests forms a tree-like structure and if there 
are no open requests, the master must have received all answers of the currently burning fire. It is very important to 
make it impossible that by a delay of messages some answers are not counted, as it would be, if the master would just 
count open requests, without identifying them individually. It can easily happen that the master receives an answers 
for a request, without having received the information about the very existence of the request. It is worth to mention 
that in this scheme the order of burnings is irrelevant, if the burn-time is not measured, as it was done here. 

Adding up the number of burnt sites gives the total size of the burnt cluster. This number is finally sent to all 
slices. If it is nonzero, the step is considered as successful. 

After equilibration the cluster structure of pointers and roots as described above (see section 2.3.1), needs to be 
constructed. This is done in a naive manner: Keeping track of sites, which have already been visited, every site is 
visited once. The first site visited in each cluster becomes root of all sites connected to it, which become marked as 
visited. The procedure corresponds to the burning procedure described above (see section 2.3.4). 

Each slice maintains a local histogram V^, which contain all clusters, which do not have a site on the border to 
another slice. Otherwise, they are maintained at the master's histogram, as discussed below. In this case the (local) 
root site of these clusters are moved to the border. As periodic boundary conditions apply, the only boundaries are 
those with other slices. 



2.5.2. Collecting statistics 

After finishing the equilibration phase another concept needs to be applied in order to count the total cluster size 
distribution Vf{s). At every update of the lattice each slice must keep track of the clusters in the same way as it was 
described in section 2.3.1. Clusters, which do not contain a site at a border to another slice are maintained locally, 
i.e. at each node as a local histogram. However, if a cluster contains a site at a border, it might span several slices. As 
soon as a cluster acquires a site at the border, it is removed from the local histogram and the site under consideration 
becomes the root of the cluster. The algorithm ensures that a cluster with at least one site on the border has its root 
at the border. 

During all processes (growing or burning), the size of all clusters is updated as usual, independent from the location 
of the root. If the status of a border site changes, its new value or its change is put on a stack together with its 
coordinate. During the growing procedure the following changes of the status are possible: 

• New occupation: Change in occupation information for a site (cell); If this is the only change, then it must 
have been already occupied (this is only possible, in an implementation using cells). If this is not the case, the 
reference information pointing to the root site of the given cluster, must be updated also, see next point. 

• Merging border clusters: Change of the reference information for a site (cell); This can only happen if the site 
(cell) was (completely) unoccupied at the time of the change or did contain a size information, i.e. it was itself 
a root. 

• General merging of clusters: Change in size information for a site (cell); Only an increase is possible, so that 
any change can be represented by a single number indicating the size difference. 

For each border site changing at each slice, the corresponding information are sent to the master. Typically the 
number of messages is not very large, because the total number of sites updated during a single growing phase is 
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limited by / S. The expected number of these messages is not given by the fraction of border sites in each sHce, 
because changes in all border clusters (i.e. clusters with at least one site in the border) affect the border sites, as the 
root of each border cluster is a border site. 

However, the data regarding the updates in the border do not need to be send from the slaves to the master, if 
the burning attempt following the growing fails, i.e. if an empty site has been selected for lightning. Of course it is 
much more efficient not to send any data if not necessary. As there is only a finite number of sites in each slice, the 
theoretical limit of updates of border sites is bound by this number. However, it is sufficient to allocate a reasonable 
amount of memory (2L turned out to be enough) for the stack of messages to be sent and check its limits, similar to 
the stack used in the burning procedure described in section 2.3.4. Henceforth the sending of the update information 
of the border is called "sending the border" . 

The master maintains a copy of the state of the border sites and updates a global histogram of border clusters. 
By sending the changes on the border to the master as described above, the master can update its copy of the 
configuration of the borders as well as the global histogram. At the end of the simulation all histograms {S slaves 
histograms plus the global histogram maintained by the master node) are summed to produce the total V^. 



A C 




B D 



FIG. 15 The slices, three of which are shown here, maintain the references for all clusters within each slice (illustrated by 
arrows), even for border clusters. The references between slices, however, are maintained by the master. The variables A = 0, 
_B=L-1, C — I and D — I + L — 1 are the indices used for references within each slice. 

As suggested in Fig. 15, the slices maintain the pointers within each slice and these references are not changed by 
the master, which only connects between slices. If a reference at the border changes at a slice, the master receives a 
message to apply the corresponding changes (joining two clusters), if the size of a cluster changes, the master updates 
the corresponding unique root etc. These changes are indicated by the slaves and the master only realises them in the 
copy of the border sites. Only if a change in occupation occurs, the master must actually perform some non-trivial 
operations, because a newly occupied site might introduce a new connection between borders of different slices. From 
the point of view of the master, only borders belonging to two different, neighbouring slices are directly connected 
and therefore to be maintained by the master, while the connectivity of the borders within each slice is indicated 
and maintained by the corresponding slave. Apart from that, the master maintains the slice spanning structures in 
exactly the same way as the slaves, e.g. a cluster having multiple roots among the various slices has a unique root at 
the master etc. 

The question arises how the master best keeps track of the changes of the borders. Ideally, a change of reference of 
a site at the boundary is communicated to the master simply by sending the new pointer value (index). By choosing 
a reasonable indexing scheme, this is indeed possible. If the value of the reference is within and L — 1, where L is 
the width in terms of number of sites (or cells) (see Fig. 15), the reference denotes a site in the left border within 
the same slice. Similarly, if the value of a reference is within / and I + L — I, where / denotes the first index in the 
last column, a reference with such a value is bound to point to the right border of the same slice. If the master uses 
indexes of the range [L,I — 1] for denoting cross-references between slices the references are therefore unambiguous 
and no translation is necessary between indeces used by the slices and indeces used by the master. 

During the burning procedure the master can make use of its knowledge about the borders. The site selected for 
starting the fire is most likely a bulk size, so that the corresponding slave needs to be contacted for the occupation 
information. Three outcomes are possible: 

• The site is unoccupied. Nothing happens, all slices get signalled to continue with growing. 
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• The site is occupied, but does not contain a border site. In this case the shce contacted can send back the size 
of the burnt cluster (an information it knows even without actuaUy doing the burning as the size is stored in 
the root, which needs to be found anyway in order to find out whether the cluster is a border cluster) and the 
master can signal all other slices to send the border and to continue. After receiving the borders it can update 
the histogram.^ 

• The site is occupied and contains a border site. In this case the slice sends the reference of the border site back 
to the master, which then contacts all slices to send the most recent border update. It updates the border and 
the histogram, deletes the cluster which is going to burn and sends the "burning borders" , i.e. a list of all border 
sites which will be affected by the burning procedure to the slices in form of a stack as described in section 
2.3.4. The slaves use this stack as the initial stack of the burning procedure and delete the corresponding sites. 
No communication between the slices is necessary. 

The global histogram contains much larger clusters than the local histograms. In order to keep memory requirements 
low, even for histograms of resolution unity, it is reasonable to introduce a threshold, above which slaves use the 
global histogram to maintain even for local clusters (i.e. non border cluster). For that purpose a histogram 
"appendix" has been introduced. This is a finite stack, which stores the size of the cluster s together with the value 
of t' — ±{T — t + I) as described in section 2.3.3. During the growing phase when such large clusters grow fastly, one 
would obtain a sequence of stack entries of the form (s, t'), (s, —t'), (s + 1, t'), {s + 1, — t'), (s + 2, t'), . . . , corresponding 
to entering the appendix, (s, t'), increasing in size by 1, which gives (s, ~t'), (s + 1, t') etc. As soon as a cluster is larger 
then the upper cutoff each update causes two entries, of the form (s, ~t'), (s + l,t'), the first for the deletion from 
the histogram, the second from the increase in the next slot. These entries possibly cancel, for example the sequence 
above is equivalent to the single entry (s + 2,i'). It turned out to be highly efficient to perform this cancellation, i.e. 
to check the last entry in the appendix for being the negative entry of the one to be done. 

As the maximum size of the appendix is finite, it must be emptied from time to time. The information about the 
size of the appendix of each slave is sent to the master together with the information about the borders. If a possible 
overflow is detected (2/3 of the maximum size in the implementation presented) the master requests all slices to send 
the content of their appendices and applies it to the global histogram. The slices then empty their appendices. 

2.5.3. The Random Number Generator 

The random number generator (RNG) acquires a crucial role when used in a parallel environment. With M the 
number of iterations, the expected number of calls of the RNG is M6~^/p (for M « 10^, w 5 x 10'' this is 
more than 5 x 10'^), so that an RNG as rani in (Press et ai, 1992) with a period of only w 2 x 10^ is insufficient. 
Therefore ran2 in (Press et ai, 1992) was used for all simulations, both parallel and non-parallel, which has a period 
of > 2 X 10'*. If the number of RNG calls is small enough, one can compare results obtained by means of rani and 
ran2. No significant deviation was found. 

In the parallel implementation, each slave requires an independent sequence of random numbers. This is a classical 
problem in parallel computing (Aluru et al., 1992; Coddington, 1996). The simplest solution is to divide a single 
sequence ri,r2,... into distinct subsequences. This can be done either by a leapfrog scheme (Coddington, 1996; 
Entacher, 1999), where each subsequence consists of random numbers which are S calls away, i.e. S subsequences 
of the form r^, rs+u, T2S+u, ■ ■ ■ with u ~ 1,2, . . . , S unique at each slave, or by splitting the sequence (Coddington, 
1996), so that each subsequence consists of consecutive RNG calls, i.e. ri^uX, f2+uX, ^a+uX again with u = 1, 2, . . . , S* 
and offset X large enough to avoid any overlap. The latter scheme has the advantage that the sequence consists of 
consecutive RNG calls and therefore has been used in the following. The implementation of the offset X at each slave 
is easily realised by restoring all state variables of the RNG, which have been produced once and for all in a single 
run producing all XS random numbers and saving the state variables on a regular basis. However, such a technique 
is advisable only if the RNG calls do not dominate the overall CPU time, in which case it would take almost as long 
as the simulation itself to produce the random numbers required for it. 



^ One might be inclined to postpone the sending of the borders to a time, when it is really needed. However, after a successful burning 
the time t is increased and this enters the histogram (see section 2.3.3). Ignoring this change for a large number of steps would introduce 
uncontrollable deviations of the estimator of the histogram from its true value. 
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FIG. 16 The rescaled and binned histogram ^{3)3^ /Vil), where r* = 2.10 for 9'^ = 125, 250, 500, 32000, 64000 (as 
indicated) in a double logarithmic plot. The linear size L is chosen according to the bold printed entries in Tab. Ill and 
large enough to ensure absence of finite size effects. The error-bars are estimated from shorter runs. The rightmost histogram 
(dotted, = 64000) could not be cross-checked by another run, see text. The dashed lines belong to different exponents, 
whose value is specified as the sum of the slope in the diagram and r*, i.e. a horizontal line would correspond to an exponent 
2.1. The shortly dashed lines represent estimated exponents for different regions of the histogram (2.22 for s within approx. 
[20, 200] and 2.19 for s within [200, 2000]), the other exponents are from literature, namely 2.14(3) in (Clar et al, 1994, 1996) 
and 223/91 ~ 2.45 in (Schenk et al., 2002). Since it was impossible to relate these exponents to any property of the data, the 
exact position of the lines associated with them was chosen arbitrarily. 



3. RESULTS 

The sections above were only concerned with the technical issues of the model and its implementation. Some of 
the actual results from the simulation carried out using the new algorithm have been published already (Pruessner 
and Jensen, 2002a). This article was focused on n(s). The main outcome was that the standard scaling assumption 
(12) is not supported by numerics, so the main conclusion was that the model is not scale invariant. 

In the following these results are shortly restated and discussed. Other observables are connected with this ob- 
servation to see, whether it is only n[s) which lacks scale-invariance. All results presented are based on the same 
simulations, the parameters of which are given in Tab. III. 

3.1. Cluster size distribution 

Before the actual findings are discussed, it is important to consider how to avoid finite size effects, which otherwise 
might damage the results. Usually, finite size effects are avoided by keeping the correlation length ^ small compared to 
the system size. However, it requires a significant amount of CPU-time to actually determine the correlation length. 
Moreover, a priori it would not be clear, which ratio ^/L to choose in order to avoid finite size effects. 

The simplest way to determine whether finite site effects are present is to compare estimates of observables for two 
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TABLE III Parameters and results for different choices of L and 9 ^ . The average cluster size is denoted by s, for definition 
see (1), but due to a truncation in the histogram for some of the simulations in the range 2000 < < 16000, the number 
presented is actually the average size of the burnt cluster. In the stationary state it is — apart from small corrections — also 
given by (1 — p)/{6p), see (7). Values of and L printed in bold indicate results shown in Fig. 16, the other results are only 
for comparison. All data are based on 5 x 10*^ (successful) updates (s. Sec. 2.2.1) for the transient and statistics, apart from 
those printed in italics which are based on short runs (5 x 10® updates for the transient and 1 x 10® updates for statistics). 
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systems with the same parameters but different sizes (Pruessner et ai, 2001). If finite size effects are not present, 
the differences between the estimators of those two systems must be within the error bar of the quantity under 
consideration. This approach has the drawback that each set of parameters must be simulated at least twice, but it 
gives full control over finite size effects. Apart from — 64000, which is specially marked in most of the plots, this 
approach has been apphed throughout the results presented. The method was discussed in greater detail in (Pruessner 
and Jensen, 2002a). 

Fig. 16 shows a central result of (Pruessner and Jensen, 2002a). This figures contains the reduced (and binned) 
data in the form 

^ (41) 

which has the convenient property to be unity for s = 1. The normalisation V^{1), which converges anyway to a finite 
value as 9^^ oo (see Tab. Ill), does not affect any of the results, especially not the (attempted) data collapses. 

The crucial problem shown in Fig. 16 is the intermediate minimum that develops as is increased. It renders 
the data collapse as described by (12) impossible (for more details see (Pruessner and Jensen, 2002a)). Fig. 17 shows 
the same data again, now in an attempt to form a data collapse, using So{9) = 9~^ with A* = 1.11 from Eq. (21) 
and T* = 2.10 (for comparison see Tab. IV). As expected the collapse fails. 
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s/s/e; 

FIG. 17 Attempt to collapse the data shown in Fig. 16 using r* = 2.10, sa{9) = 9~^' and A* = 1.11 as derived from Eq. (21). 
As expected the data do not collapse. The big arrow points in the direction of increasing 6^^. 



In less technical terms, it was shown in (Pruessner and Jensen, 2002a) that there is no choice of t, which allows 
a data collapse for V^{s;9). It seems that the distribution is the same for two different values of 61 up to a certain 
cluster size, which increases seemingly unbound with 9~^, i.e. for two very large values of the two distributions 
collapse without any rescaling. Beyond this cluster size the distributions deviate, the one with the larger forms 
a deeper dip and ascends afterwards to a maximum, which can, by rescaling, be arranged to be the same for all 9~^. 
The ever growing dip prohibits a reasonable definition of a lower cutoff and makes a data collapse impossible. Equally 
one could arrange the dips to be at the same height and the maximum to increase in 9^^. 

The key problem of the DS-FFM is that more than one length scale is visible apparently for any system size L. 
The statistics of n{s) is not even asymptotically dominated by a single length scale. For any system size, a n{s) only 
given for all s larger than any lower cutoff, allows the identification of 6 by the shape of n(s) alone. 

This indicates that simple scaling (12) docs not apply and the exponent r is undefined. Keeping this in mind, it is 
very instructive to look for other properties as well and investigate their scaling. 

3.1.1. Finite size scaling 

The failure of the DS-FFM to obey proper finite size scaling has been observed in (Schenk et ai, 2000) already. 
In the following some finite size scaling principles have been applied in a straight forward manner and subsequently 
ruled out. 

As known from percolation (Stauffer and Aharony, 1994), the generalised form of the scaling behaviour of sq is 

so{9,L) = 9-^m{9L'') (42) 

where m(x) is a crossover function describing the dependence of sq on the two parameters 9 and L. For sufhciently 
large argument x, the crossover function is expected to approach a constant, such that (20) is recovered. For small 
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arguments, however, the dependence of the cutoff is expected to be strongly dominated by L, just like in equilibrium 
critical phenomena, where L takes over the role of ^ for sufficiently small systems. Thus, for small arguments 
m{x) (X x^, so that for sufficiently small L, sq becomes independent of 9. 

Generic models of SOC do not have any tuning parameters other than the system size, so that the cutoff sq is only 
a function of L. In this sense, finite size scaling is the only scaling behaviour in SOC, and a failure of the model to 
comply to finite size scaling is identical to the failure to comply to simple scaling altogether. Therefore, one might be 
surprised to sec a simple scaling analysis and a finite size scaling analysis in an article on an SOC model. However, 
the Forest Fire Model is different in this respect, as it has the additional parameter 0, which is, supposedly, finite only 
because of the finiteness of the system size. In the thermodynamic limit it supposedly disappears as a free parameter. 

As seen above (see Fig. 17), the 6'-dependence of fi{s;6) can not be captured by sq in the scaling function alone. 
However, the scaling form (12) would remain valid in some sense, if in the finite size scaling regime the L dependence 
of n{s; 9) enters sq only. Therefore the original form (12) is generalised to 

n{s-9,L) = s-^g{s/so{9,L)) (43) 

ignoring that it has been shown above already that it does not hold in the limit where n(s; 9, L) becomes independent 
of L. In this section the dependence of n(s; 9, L) on L is investigated, in the limit of large 9~^ and small L. A similar 
study has been performed by Schenk et aL(Schenk et at, 2000), however on much smaller scales and using V^. 

If the form (43) holds, it should be possible to collapse n{s;9,L) for different L by choosing the correct r and Sq, 
just like for the cluster size distribution of standard percolation. This turns out not to be the case, as can be seen in 
Fig. 18: The smaller L is, the stronger the changes of shape of n{s) for any 9 tested. Consequently, (43) does not 
hold, and as sq is only defined via its role as cutoff in (43), sq is undefined and (42) remains meaningless. 

One might argue that the average density of trees, p (see (4)), is the relevant parameter of n{s), so that n{s) has 
the same shape for different, sufficiently small L and constant p. However, as shown in Fig. 20, for any value of 9, 
there is a value of L, such that p varies considerably with decreasing L. Especially, there seems to be a maximum 
tree density for every system size, so that for large values of p, there is a smallest system size L, below which this 
density cannot be reached. This maximum increases monotonically with system size, so that the maximum for every 
finite system size is smaller than the expected average tree density in the thermodynamic limit, which is according to 
Tab. HI larger than 0.40777 and was recently conjectured to be as large as 0.5927 . . . (Grassbcrgcr, 2002), namely the 
critical density of site percolation (Newman and Ziff, 2000). Accepting this limitation, Fig. 19 shows an example for 
three n{s) with roughly the same p and different L and 9. Most surprisingly two of the histograms collapse already 
without rescaling, while the third (L ~ 500) reveals the same problems as visible in Fig. 16. Hence, finite size scaling 
does also not work for fixed p. 

That large densities of trees cannot be reached by small system sizes is related to the specific way the histograms are 
generated and the density measured: Is it before or after each (successful) burning? For sufficiently large systems, it 
becomes irrelevant when to do it, because two histograms, one measured before, the other one right after the burning, 
differ only by one cluster. Also the question, whether to average only over successful burnings is irrelevant, because 
the difference between a histogram before and after the burning is only one cluster. 

Clearly, for small systems, the difference between the histogram before and after the burning, is just the one 
enormous cluster of size 0(9~^). Fig. 21 shows the difference. Even though in principle every density is reachable for 
every system size if the histogram is measured before burning, the newly defined histograms do not have a considerably 
different shape, so that a collapse remains impossible. For example, the problems shown in Fig. 18 become even more 
pronounced, if the histogram is taken before burning. 

Surprisingly and actually in contradiction to what has been said in Eq. (6), there is a discrepancy between the cluster 
size distribution of burnt clusters, V^, and the overall cluster size distribution T'^, even if the latter is measured before 
the burning takes place. This sounds paradoxical, because the random picking of a cluster to be burnt is just a 
sampling of T"^. This cannot be caused by the correlation between those samples, due to the fact that nt+i{s) is 
actually a function of the cluster chosen at t — a correlation like this would be equally picked up by T"^. The reason 
for this discrepancy is the fact that a site picked randomly as the starting point of the next fire is necessarily occupied. 
Therefore nt{s) with a low occupation density enter ovcr-wcightedly. As low density states contain much more 
small clusters then large ones, overestimates the probability of small clusters. A sample of at a low density is 
indistinguishable from a sample at high denisty, while a sample for trivially contains the information about the 
density. To illustrate that, one might imagine a sequence of configurations that consists of one state, with exactly 
one cluster of size 1, and a second state, with exactly one cluster of size L^. The two configurations appear with a 
frequency such that a cluster of size 1 is burnt down as often as a cluster of size . The resulting reports that 
a randomly chosen site belongs to a cluster of size with probability ^ and to a cluster of size 1 with probability 
1/(2L^), while incorrectly reports the same probability for both cluster sizes. The problem can actually already 
be spotted in (6), which contains a p on the RHS, which should rather be p{t). The problem disappears in the limit 
where p{t) hardly changes in time, i.e. in the limit of 9~^ <C L^. 



27 



* 




FIG. 18 Plot of the rescaled PDF P^(s; L)s^* /^''(l; ^. -f-) for fixed 6'^ = 1000 and different system sizes, L = 
125, 250, 500, 1000. The different shapes make it impossible to collapse the data, as would be expected from a finite size 
scaling ansatz (43) and (42). 



It is also clear, why (7) breaks down for small systems and large ^: The average size of the burnt cluster tends 
to L^, while the density tends to 0. Apparently (7) must be incorrect for p < + 1)^^. 



3.1.2. Scaling of the moments of 

According to (12), (20) and (8) the nth moment of should scale like (this analysis has apparently been introduced 
to SOC by De Menech et al.(De Menech et at, 1998; Pastor-Satorras and Vespignani, 2000; Tebaldi et at, 1999)) 



~ ^ Es^s^Ms- e) ^ q^^Q-x(2+n-r) ^ corrcctions 



(44) 



where (?„ is a non-universal amplitude (see section 3.1.3) and A is also known as the gap exponent (Pfcuty and 
Toulouse, 1977). The corrcctions arc due to the lower cutoff and the asymptotic character of the scaling, which is 
expected only for "sufficiently small 9" (Wegner, 1972). In turn, one can infer a scaling form like (12) if the moments 
scale in the form of (44). 

Contrary to what is observed in an attempt of a data collapse, it turns out that the moments follow beautifully 
this scaling behaviour. Fig. 22 shows the scaling of the moments for n = 2, 3, 5, 10. By simply fitting the double 
logarithmic data to a straight line, i.e. 

log(?^) =a;-a„log(0) (45) 

one can derive an estimate of the exponents fT„ and in turn compare them to the expected linear behaviour: 

CT„ = A(2 + n - t) . (46) 

The resulting estimates, using n = 2, . . . , 8 and ai = 1 from (1), are A = 1.0808 . . . and r = 2.0506 . . . , where no 
statistical error is given because the systematic error, due to neglecting of the lower cutoff as well as the corrections 
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FIG. 19 Plot of the rescaled PDF V'' {s; 9, L)s^' /V'il; 9, L) for fixed p « 0.397: L = 500 with 1/9 = 2000 (p = 0.396827), 
L = 1000 with 1/61 = 940 (p = 0.396825) and L = 4000 with 1/6 = 870 (p = 0.396883). Again, a data collapse is impossible. 



(44), is expected to be much more important. By using the assumption cti = 1, this result is consistent with (21). 
The results are shown in Fig. 23. 

The exponent found for t is remarkably close to the accepted value of standard 2D percolation, 187/91 = 
2.054945.... However, if one leaves out the results for = 64000, which seem to be a bit off the lines shown 
in Fig. 22, one finds a slightly larger value for the exponent, namely r = 2.0864 and A = 1.0998 .... This is much 
closer to the t* = 2.10 used above. For comparison to values found in the literature, see Tab. IV. 

It is very remarkable that the resulting estimates for the exponents are so impressingly consistent, even though 
in section 3.1 it turned out, that the scaling assumption (12) does not actually hold; one would much rather expect 
a failure of the moments to comply with (44), or a failure of the exponents to comply with (46). Apparently the 
moments are hiding the breakdown of simple scaling. Therefore it is interesting to analyse the behaviour of the 
presumably universal amplitude ratios, which are solely a property of the (presumed) scaling function. 

Another explanation for the moments being well behaved is the following: According to (Prucssner and Jensen, 
2002a) one might expect the moments to behave like 

dsf{s)s"+ / dss""^g(s/6l-^"""') (47) 

where the first integral describes the behaviour up to the minimum, which scales like 6/-^™*" (x„i„ « 0.95) and the 
second integral the behaviour from the minimum on. Because Fig. 17 indicates already that the scaling function 
Q does not collapse using a scale 6/~^"">'' this scaling does not work and can therefore be only an approximation. 
While the first integral is bound by C'(0^^>nin("+i)) the second integral gives O(0-^max(i+n.--r)^ asymptotically, which 
dominates the moments for x„-,i„(n + 1) < x„-^^^{l + n — r), which leads to n > 9.08 using x^^^ « 1.2 and r « 2.1. 
Fig. 22 shows clearly a deviation from the straight line behaviour for = 64000 and n = 10 and even for n = 5. It 
remains unclear whether this is due to the effect discussed or simply a finite size problem. According to the findings 
presented in section 3.1.3 the latter might well be the case. 
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FIG. 20 The average density of trees, p, as a function of 9 and for various L. For sufficiently small systems, the maximum 
in p is much smaller than the expected density at the "critical point", which is larger than 0.40777 found as in Tab. III. The 
straight line marks p = 0.396827, the density chosen in Fig. 19. The inset is a magnification of the crossing of the straight line 
with the simulation data, and shows all three values of 0, L used in Fig. 19. 



TABLE IV Exponents of the Forest Fire Model found in the literature. The first column indicates the source, the second 
column the method. P{s) denotes a direct analysis of n{s;0), which sometimes may have been just an estimate of the slope 
of n{s;0) rather than a data collapse. For details the original sources should be consulted. The entry "moments" refers to an 
analysis of the moments of P{s), the entry "theoretical" to theoretical considerations regarding the relation of the Forest Fire 
Model to percolation. 



reference 


method 


r 


A 


Christensen et a/. 1993 (Christensen et al, 1993) 


Pis) 


2.16(5) 




Henley 1993 (Henley, 1993) 


P{s) 


2.150(5) 


1.167(15) 


Grassberger 1993 (Grassberger, 1993) 


Pis) 


2.15(2) 


1.08(2) 


Clar et al.l994 (Clar et al, 1994) 


Pis) 


2.14(3) 


1.15(3) 


Honecker and Peschel 1997 (Honecker and Peschel, 1997) 


Pis) 


2.159(6) 


1.17(2) 


Pastor-Satorras Vespignani 2000 (Pastor-Satorras and Vespignani, 2000) 


moments 


2.08(1) 


1.09(1) 


Schenk et a/.2002 (Schenk et al, 2002) 


theoretical and P{s) 


2.45 . . . 


1.1 


Grassberger 2002 (Grassberger, 2002) 


Pis) 


2.11 


1.08 



It is worthwhile pointing out, that the analysis in this section arrives at estimates for the critical exponents very 
close to those obtained by Pastor-Satorras and Vespignani (Pastor-Satorras and Vespignani, 2000), who, however, 
allow for the corrections in (44) which were omitted above. 

3.1.3. Universal amplitude ratios 

In general simple scaling involves two additional non-universal parameters a and 6, 

nis;9)^as-^g(^-^^ . (48) 
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FIG. 21 Comparison between the rescaled and binned histograms measured before and after the burning for small L = 125 
and large = 1000. As expected, only the statistics for large s is affected. The dashed line shows the data for V^l^s). 



For 1 < r < 2 the lower cutoff becomes asymptotically irrelevant compared to the upper cutoff for all moments n> 1 
— indeed the effective t of sn(s; 6) fulfils this condition as 2 < t < 3 (Clar et ai, 1996). Neglecting the lower cutoff 
then gives for the nth moment of sh{s; 6) 

7^ = a{bsoY+'---g^ (49) 

with 

gn= dxx^+'^-''g{x) (50) 
Jo 

In order to construct universal amplitude ratios, one needs to get rid of all exponents and parameters. This can be 
achieved by considering 

,(l-r)\(l-"/2) 5^ 



and (Eq. (51) with n ~ 1) 



(53) 



^ 92 

If one now multiples (51) with the n — 2th power of (52), everything cancels apart from the Qn 

(s2)n/2 (s2)(n-2)/2 g^-^ 
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FIG. 22 Scaling of the nth moments of in double logarithmic plots. The straight lines show the results of a fit as exp(aj,)^~'^" , 
see (44). 



It is worth noting that for a trivial case, where s" oc s", the effective exponent t is necessarily unity and (51) as well 
as (52) are already independent of 9. 

A further simplification is to impose gi = \ and 92 = 1, which fixes the two free parameters a and b in (48), so that 



gn = 



s" (s) 



(«-2) 



(n-l) 



(54) 



for n > 1. In Fig. 24 this quantity is shown for n = 3, 4, 5, 6. Now, for = 64000 a deviation is clearly visible — in 
turn that means that 9^"^ = 64000 requires at least systems of the size L ~ 64000, which might explain the large value 
of p obtained in (Grassberger, 2002). Apart from that, this analysis agrees with the result found in Sec. 3.1: The 
supposedly universal amplitude ratios keep changing with 9 and an asymptote cannot be estimated, i.e. the scaling 
(12) is broken. 



3.1.4. Burning tinne distribution 

Another distribution of interest is the distribution of burning times, Pt„(Tm;^). The statistics are comparatively 
small for this quantity, as the burning time is defined only for the cluster removed. However, they still seem to be 
good enough to allow us to make a statement about their scaling behaviour. The rescaled data, Ptm(7m; &)Tm with 
a trial exponent b* ~ 1.24 can be seen in Fig. 25. The intermediate part of the distribution between Tm ~ 4 and 
the maximum seems to bend down as 9~^ increases, but the developing dip is much less pronounced than in Fig. 16. 
Nevertheless, the region where a data collapse seems possible moves out towards larger values of Tm, which again 
prohibits simple scaling. Assuming that the bending might become weaker for sufficiently large Tm leads to a data 
collapse shown in Fig. 26, using an exponent i^' = 0.6 as defined in Eq. (22). However, only for values of Tm « Tmq 
the data possibly collapse. Again, this violates the assumption of simple scaling, namely that there is a constant lower 
cutoff above which the behaviour is universal. 

The only remaining exponent of those defined in sec. 2.2.4, ^' , relates the statistics of s and Tm. It requires the 
bivariate distribution P{s,Tm_',9), as the exponent is derived from E(Tm|s) (x s^^'^ , which is essentially equivalent to 
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FIG. 23 Exponents (t„ of the scaling of s" in 9 vs. n. The slope of this curve gives A and r can be derived from the offset. The 
straight, full line shows the results A = 1.0808 . . . and r — 2.0506 . . . , the dashed line shows A — 1.0998 . . . and r — 2.0864 . . . 
from a fit excluding = 64000. 



Eq. (17). The distribution P(s,Tm;6') is shown in Fig. 27. At first glance the assumption of a power law dependence 
of s and Tm seems to be confirmed. Also the width of the distribution seems to be very small, with almost no change 
over 5 orders of magnitude in s. However, the plot is double logarithmic, so that the width roughly scales like the 
slope, which is about 0.6, as shown by straight lines. This matches perfectly the exponent chosen to rescale P (see 
caption of Fig. 27). 

By inspecting E(Tm|s;0) and E(s|rM;^) for various 0, one can determine /i' as slope in a double logarithmic plot. 
Fig. 28 shows that fi' remains ambiguous and deviations from the expected behaviour do not vanish as is increased. 
Asymptotically one might expect l//i' « 0.62, while (r* — 2)/(6* — 1) suggests 1/^' w 0.417. The value of 0.62 is 
consistent with the rough estimate 0.6 made in Fig. 27. Fig. 28 also shows two other exponents, 0.53 and 0.7, the 
former being in line with the value found in literature of 0.529(8) (Clar et at, 1994). 

Conclusively it is noted that the other observable available in this study, Tm , does not seem to provide an alternative 
way to ascribe the DS-FFM critical behaviour in the sense of the scaling behaviour as proposed in the literature. 

3.2. Tree density as a function of time 

As mentioned above (see section 3.1.1), the density of trees, p, is actually a function of time. Initially, it is periodic 
around the average value, with an amplitude that depends mainly 9. This amplitude decays in time and after 
sufficiently long times p{t) looks like a random walk around p. 

Fig. 29 illustrates how the period and the amplitude depends on 9 and L: The period is proportional to 9L^, while 
the amplitude mainly depends on 9, i.e. the strength of the influx cx 9^^. The reason for the former is easy to 
understand: 9~^ /L^ is proportional to the fraction of newly grown trees (Honecker and Peschel, 1997); the change of 
the tree density is roughly 



d 



1-p 1 
p 9L^ 



(55) 
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FIG. 24 The supposedly universal amplitude ratio Qn (54) for n = 3, 4, 5, 6. The error bars are based on a Jackknife scheme 
(Efron, 1982; Pruessner et al, 2001) using a roughly estimated correlation time of 50, see Tab. 11. 



assuming that it hardly changes during the growing. Otherwise, one would have to introduce a microscopic timescales, 
which makes it possible to measure the tree density on the timescale on which the trees are grown. The pre-factor 
(1 — p)/p takes into account that only empty sites can be re-occupied and that an occupied site is required for the 
burning to start. The second term on the right hand side, ri[p(t),t), is a noise, which represents the burning of the 
trees. From this equation one can already expect that the period is roughly linear in 9L^p/{l — p). This has already 
been measured in detail by Honecker and Peschel (Honccker and Peschel, 1997); the numerical results presented here 
(Fig. 29) arc fully consistent with their results. 

Apart from the relevance of the periodic behaviour for the equilibration time, the periodic behaviour of p{t) is 
physically of great significance: What distinguishes the state of the system for a given p at the ascending and the 
descending branches? Trivially, the sequence of configurations of the system is Markovian, while the tree density 
alone as a time series is certainly not. The configuration somehow manages to "remember" whether the tree density 
was increasing or decreasing during the last update, in order to keep p{t) periodic. 

One explanation for this behaviour might be a "growing-and-harvesting" concept: From the initially completely 
random tree distribution larger and larger patches are formed, so that larger and larger patches are harvested by 
lightning. When the density reaches the maximum, for a while the patches harvested remain large compared to the 
amount grown. This is because the growing process does not actually produce those large patches itself, but makes 
them available to the harvesting by continuously connecting smaller patches in areas, where the lightning has not 
yet struck. This process goes on, until almost all the trees are newly grown, i.e. the trees are distributed almost 
randomly, apart from the spatial correlation in density. The period of this process would be proportional to the time 
it takes to renew the entire system, which is LF'9p/{l — p). namely divided by 's, see (7). 

The time-dependent tree density gives only a hint of what actually happens in the system. It would be very 
instructive to study the two-point correlation function as a function of time to answer the question, whether the 
explanation above is actually valid. 
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FIG. 25 The rescaled probability distribution of the burning time, Ptm {Tm', 9). Similar to Fig. 16 a dip seems to form between 
the low Tm region and the maximum, which again renders a data collapse impossible. 



3.3. Discussion 



From the results presented above it becomes clear that the Forest Fire Model does not show the scaling behaviour 
expected for a system, which becomes critical in the appropriate limit (namely L — > oo and 9^^ oo). One might 
argue that another scaling ansatz could lead to a distribution which is asymptotically scalefrce in this limit, for 
example a multifractal ansatz (Tcbaldi et ai, 1999) or the one proposed in (Schenk et al, 2002), where more than 
one scale is assumed to govern the model. For an asymptotically scalefree distribution, the scales have to diverge or 
to vanish in the appropriate limit. It has been suggested already very early (Honecker and Peschel, 1997) that more 
than one characteristic length scale can be found in the Forest Fire Model. 

However, changing the scaling assumption would entail a new definition of the exponents r, D etc., which would 
therefore prohibit comparison with other results, which are based on the assumption of simple scaling (12). Moreover, 
introducing multiple scales would stretch the notion of universality, especially the universality of the scaling function, 
to its limits. As can be seen in Fig. 16, the shape of the distribution function is not universal^ i.e. the shape of this 
function is unique for every single 9~^, even for L oo. This is in direct contradiction to the concept of universality, 
scaling and scale invariance. 

However, it might be possible to reestablish simple scaling by introducing another mechanism in the model, as was 
done for example in the "autoignition Forest Fire model" (Sinha-Ray and Jensen, 2000). If there were, for example, 
a mechanism parameterized by u, such that 

n(s; 0, u) = s''g{s/so{0, u)) (56) 

then simple scaling might be reestablished possibly by choosing an appropriate u = u{9); even the cutoff, sq, which 
were assumed to diverge with 9~^, would then effectively depend only on 0. Currently, there is no hint, what this 
new parameter u could be. 

Lise and Paczuski (Lise and Paczuski, 2001b) suggested for a similar problem in the OFC model (Olami et al., 1992) 
to define an exponent r by the slope of the distribution V^{s), imposing the remaining background, J-(s,L,0~^), to 
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FIG. 26 Attempt of a data collapse for Pt,^{Tm',6)- Only at the far end of the scaling function at the descent from the 
maximum, the data seem actually to collapse. This, however, is not sufficient for a data collapse. The big arrow points in the 
direction of increasing . 



be as straight as possible: 

In (7'^(s)) = -T/n(s) + T{s, L, 6"^) (57) 

This ansatz, in fact based on a multiscahng ansatz, would indeed allow the measurement of an exponent, however, 
with some degree of ambiguity. The crucial problem with this approach is that, firstly, it again does not allow any 
direct comparison to other models, where the exponents are defined via (12) and that, secondly, the notion of a 
presumably universal exponent hides the fact of broken scaling. 

From Section 3.1 one might conclude that there docs not even exists a limiting distribution for n{s;6). However, 
even if it exists, that does not mean that simple scaling is obeyed and if it does, it is still open whether the exponents 
are non-trivial or not and whether the model posses any spatio-temporal correlation which do not vanish on sufficiently 
large scales. 

4. SUMMARY 

Using a new method for simulating the Forest Fire Model on large scales, it is possible to make clear statements 
about the validity of the scaling assumption of this model. The two observables investigated in this paper suggest the 
model docs not develop into a scale invariant state. 

The method is based on the Hoshcn-Kopelman algorithm (Hoshen and Kopclman, 1976) and uses a master/slave 
parallelisation scheme to simulate the model on very large scales and very large sample sizes. The key to the 
parallelisation is to decompose the lattice in strips and to encode the connectivity of these strips in the border sites. 
Clusters crossing these strips are then maintained by the master node, while clusters within a strip are maintained 
on the local nodes. There is almost no data exchange apart from the border configuration, which lowers the impact 
on the network linking the nodes. 

The resulting distribution T"^{s) is, different from other simulations found in the literature, the distribution of 
all clusters in the system, rather than just the burnt clusters. The resulting statistics then allows to draw clear 
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FIG. 27 Binned density plots of P(s,rM; for different values of S on a double logarithmic scale. High densities are presented 
as dark areas. For better presentation, P(s,Tm; Q) has been multiplied by a factor s^'^, tilting the distribution similar to those 
shown in Fig. 16, so that the second maxima in the distribution, those at large s and Tm, are roughly as high as the first maxima, 
i.e. they show in the plot as dark as around s = 5. Since P(s,Tm; is a histogram only of burnt clusters, it contains a factor 
s compared to n(s) (see discussion around (4)). Therefore, the exponent 2.7 needs to be compared to r* = 2.10, indicating 
that the width of P(s, Tm; &) roughly scales like s" *^, so that the reduced height of P(s, Tm; G) is caused by an increase in width. 
This coincides well with the slope of the distribution, as shown by a straight line. Thus, the relative width remains roughly 
constant. 



conclusions as to what extend the model does actually obey the scaling assumption. This turns out not to be case. 
The violation of scaling is also observed in the distribution of the burning time. Conclusively we find that there is no 
reason to assume that the Drossel-Schwabl Forest Fire Model develops into a critical state. This is in line with the 
conclusion by Grassberger (Grassberger, 2002), who however, still finds some signs that the Forest Fire Model will 
finally show some characteristics of standard percolation. 
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